Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPC Subnet Routing [2/2] -- Custom Routers and NIC 'transit IP' lists #5823

Merged
merged 77 commits into from
Jun 26, 2024

Conversation

FelixMcFelix
Copy link
Contributor

@FelixMcFelix FelixMcFelix commented May 24, 2024

This PR builds on #5777 to provide the Custom routers for subnets as described in RFD21. This entails a few things:

  • We remove the unpublished = true tag from the user API for VPC routers and routes.
  • Custom routers may be attached/detached to a VPC subnet using the custom_router field in subnet POST and PUT requests.
  • NICs now individually have a transit_ips list, which denotes an additional set of CIDR blocks that a NIC is allowed to send and receive traffic on. This is set during POST and/or PUT on instances which are stopped. This is a key feature to enable software routing by instances, as today's default behaviour drops any packets not matching an assigned IP for an instance.
    • I suspect there will be some discussion over the shape of this API, so there isn't yet test coverage here until we know we're happy with it.
  • Revisited which router routes can be created by users, e.g., better validation on v4/v6 dest/target pairs.

There are some allowances around currently non-existent features:

  • Internet Gateways. We allow unlimited use of one pseudo-gateway, inetgw:outbound, which appears in our existing rules. Using this target sends packets upstream as it does today.
  • VPC peering. VPCs as destinations/targets are currently disallowed in router routes.

Closes #2116.

OPTE now prevents itself from being unloaded if its underlay state is
set. Currently, underlay setup is performed only once, and it seems to be
the case that XDE can be unloaded in some scenarios (e.g., `a4x2`
setup).

However, a consequence is that removing the driver requires an extra
operation to explicitly clear the underlay state. This PR adds this
operation to the `cargo xtask virtual-hardware destroy` command.

This is currently blocked on opte#485 being approved/merged.

Closes #5314.
These update in response to VPC subnet changes. Now to plumb them into
OPTE.
Currently there are no triggers attached to most of the operations that
will cause us to either a) push or b) re-resolve VPC routes, but this
lays the basis for sled-agent and the background task to talk in terms
of versions.
We'll get to that in the next PR.
@FelixMcFelix FelixMcFelix force-pushed the felixmcfelix/vpc-subnets-user branch from 495c69b to ab7223b Compare June 21, 2024 11:32
pub target: RouteTarget,
/// A CIDR block (or named subnet) which this route will apply to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate? I see a destination can be an IP, IP net, VPC, or subnet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking over this!

It can be any of those, except for VPCs -- that will likely relax in future with VPC peering, but it's not something that makes sense right now. I'll try and rework the wording for this, I maybe want to make it clearer that 'destination' matches on the address in routed packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good. I was asking because I saw more stuff on the RouteDestination type. But it's fine if there are things that are allowed by the type that are nonetheless rejected by the endpoint logic as invalid, and I see you're doing that nicely here.

/// Custom routers apply in addition to the VPC-wide *system* router, and have
/// higher priority than the system router for an otherwise
/// equal-prefix-length match.
pub custom_router: Option<NameOrId>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this could just be called router and then on the VpcSubnet view we can call it router_id, and we always include the ID. But I see now, "Custom routers apply in addition to the VPC-wide system router" means that doesn't make sense, right? Because when there is a custom router, there is still also the system router. It's not a default vs. custom situation. So I think this is good.

Copy link
Contributor

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API side looks good to me. transit_ips vec seems fine. We can always tweak it after we see how it feels in practice.

pub target: RouteTarget,
/// The set of destination IP addresses or subnets that this route
/// will match packets against.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep nitpicking, but saying "set" and "subnets" might be confusing because this can only be one subnet. I can't think of a better wording — maybe rely on the definiton of RouteDestination to cover what is inside it, and here emphasize its role in the route. I like this bit in the doc comment on RouteDestination, though it's kind of long:

/// When traffic is to be sent to a destination that is within a given
/// `RouteDestination`, the corresponding `RouterRoute` applies, and traffic
/// will be forward to the `RouteTarget` for that rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the existing RouteDestination text is good there. I think if we lean into just showing its role and delegating the details to the type, I'd go for:

/// Selects which traffic this routing rule will apply to.
pub destination: RouteDestination,

@FelixMcFelix
Copy link
Contributor Author

I've mupdated london from ff0c914 to the most recent commit on this PR, without any ledger-related issues stemming from NetworkInterface. Nexus has come back up (after a schema update), so I think we're covered on the sled-agent changes and good to go right after #5777.

FelixMcFelix added a commit that referenced this pull request Jun 26, 2024
This PR wires up all the backing machinery for VPC subnet routing, and
automatically resolves and pushes updated rules to sleds using an RPW.
This allows instances in all subnets of a VPC to talk with one another
-- assuming no firewall rules have been configured otherwise. At a high
level, this works by a few changes:
* During the VPC create saga, we now push two rules explicitly to the
system router -- default routes from `(0.0.0.0/0, ::/0) ->
inetgw:outbound`.
* Any CRUD operation on a VPC subnet will reconcile the set of VPC
subnet routes within the system router to have one entry per subnet.
This takes the form `subnet:{name} -> subnet:{name}` for each subnet,
which are later resolved to both v4 and v6 entries.
* Ports are created using route information known to sled-agent -- this
defaults to an empty route set for instances/probes, and an internet
gateway rule for services to enable early NTP sync.
* Routes are sync'd with sleds using a new background task. Broadly,
this asks each sled for the set of VPCs and subnets it has ports on, and
a version for the current route set installed in each. The background
task will use this information to determine which routes must be
rebuilt, and will send updated versions out in response.

The most immediate consequence in this PR is that hosts within a subnet
-- on different VPCs -- will be able to talk with one another at last.
The user facing API (#2116) will be re-enabled in a concurrent PR --
#5823 -- as will NIC spoof detection hole-punching.

Depends on oxidecomputer/opte#490.

Closes #2232, Fixes #1336.

---

A few pieces will block tests passing & merge-readiness:
- [x] Creation of a `lab-2.0-opte-0.32` image.
- [x] Merge of oxidecomputer/maghemite#274 (and updating all the right
SHAs in this PR).
Base automatically changed from felixmcfelix/vpc-subnet-routing to main June 26, 2024 20:22
@FelixMcFelix FelixMcFelix enabled auto-merge (squash) June 26, 2024 21:37
@FelixMcFelix FelixMcFelix merged commit 97fe552 into main Jun 26, 2024
21 checks passed
@FelixMcFelix FelixMcFelix deleted the felixmcfelix/vpc-subnets-user branch June 26, 2024 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. networking Related to the networking.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Want Nexus API for managing VPC routing
3 participants